Recognizing Coordinate Structures for Machine Translation of English Patent Documents
نویسندگان
چکیده
Patent machine translation is one of main target areas of current practical MT systems. Patent documents have their own peculiar description style. Especially, abstracts or claims in patent documents are characterized by their long and complex syntactic structures, which are often caused by coordination. So, syntactic analysis of patent documents requires special treatment for coordination. This paper describes a method to deal with long sentences in patent documents by recognizing coordinate structures. Coordinate structures are recognized using a similarity table which reflects parallelism between conjuncts. Our method is applied to a practical MT system and improves its quality and efficiency.
منابع مشابه
Customizing an English-Korean Machine Translation System for Patent/Technical Documents Translation
This paper addresses a method for customizing an English-Korean machine translation system from general domain to patent or technical document domain. The customizing method includes the followings: (1) adapting the probabilities of POS tagger trained from general domain to the specific domain, (2) syntactically analyzing long and complex sentences by recognizing coordinate structures, and (3) ...
متن کاملEnglish-Korean Patent Translation System: FromTo-EK/PAT
This paper addresses a method for customizing an English-Korean machine translation system from general domain to patent domain. The customizing method includes the followings: (1) extracting and constructing large bilingual terminology and the patent-specific translation patterns, (2) adapting the probabilities of POS tagger trained from general domain to the patent domain, (3) syntactically a...
متن کاملCustomizing an English-Korean Machine Translation System for Patent Translation
This paper addresses a method for customizing an English-to-Korean machine translation system from general domain to patent domain. The customizing method consists of following steps: 1) linguistically studying about characteristics of patent documents, 2) extracting unknown words from large patent documents and constructing large bilingual terminology, 3) extracting and constructing the patent...
متن کاملToward the Evaluation of Machine Translation Using Patent Information
To aid research and development in machine translation, we have produced a test collection for Japanese/English machine translation. To obtain a parallel corpus, we extracted patent documents for the same or related inventions published in Japan and the United States. Our test collection includes approximately 2 000 000 sentence pairs in Japanese and English, which were extracted automatically ...
متن کاملBuilding a Statistical Machine Translation System for Translating Patent Documents
This paper describes the work we conducted for building a statistical machine translation (SMT) system for the Chinese-English subtask of the NTCIR-9 patent MT evaluation. Our results show that most of the generic techniques we had developed for improving SMT performance work on patent data as well, and the changes we made to our SMT system training procedure in order to address special charact...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008